Imputation of data values that are less than a detection limit.

نویسندگان

  • Paul A Succop
  • Scott Clark
  • Mei Chen
  • Warren Galke
چکیده

Results of the analyses of occupational and environmental samples are frequently reported as "less than a specified value," a practice followed by many analytical laboratories. A left-censored distribution occurs when analytical laboratories do not report results that fall below their limits of detection or quantification. Approximately 37% of the household interior dust lead loadings collected in a large-scale, multisite, longitudinal study of lead-based paint hazard controls were reported to be below the "method detection limit." These unreported values are unusable in any statistical analysis of the data and must be replaced by a valid dust lead loading estimate, a process called data imputation. This investigation tested how well data imputed using a newly formulated procedure for estimating the data below the method detection limit were correlated with dust lead loadings reported by the participating laboratories after special request. These results were also compared with those obtained by imputing the minimum detectable level by the square root of 2. Imputation of the low lead loadings was accomplished by substituting the value associated with the median percentile below each laboratory's method detection limit. A correlation of r = 0.50 was calculated between the predicted and reported dust lead loadings, with only slight bias (2.9%) in the predicted values. An alternative imputation procedure that used the predicted value from structural equation models fit to the noncensored dust lead loadings performed about as well, although the predictions had to be "centered" to correspond to the censored data. An estimator that combined both of these imputation procedures only slightly improved the correlation between the predicted and laboratory values (r = 0.51). These results support the use of the new procedure rather than the commonly used imputed values of the method detection limit divided by 2 or by the square root of 2. Imputing values based on either of these common approaches may result in much more biased predictions for the censored data; in the case of these data, the dust lead loadings were overestimated by 348%. The results also suggest that analytical laboratories should provide a numerical result for all analyzed samples, with a "flag" of those values below their detection limit, since these results may be more accurate than any imputed value, particularly those provided by the commonly used method of dividing the minimum detection limit by the square root of 2.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accuracy evaluation of different statistical and geostatistical censored data imputation approaches (Case study: Sari Gunay gold deposit)

Most of the geochemical datasets include missing data with different portions and this may cause a significant problem in geostatistical modeling or multivariate analysis of the data. Therefore, it is common to impute the missing data in most of geochemical studies. In this study, three approaches called half detection (HD), multiple imputation (MI), and the cosimulation based on Markov model 2...

متن کامل

Missing data imputation in multivariable time series data

Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...

متن کامل

An Empirical Comparison of Performance of the Unified Approach to Linearization of Variance Estimation after Imputation with Some Other Methods

Imputation is one of the most common methods to reduce item non_response effects. Imputation results in a complete data set, and then it is possible to use naϊve estimators. After using most of common imputation methods, mean and total (imputation estimators) are still unbiased. However their variances (imputation variances) are underestimated by naϊve variance estimators. Sampling mechanism an...

متن کامل

Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons

Background Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern...

متن کامل

تحلیل مشاهدات گمشده در مطالعه اثر دوزهای مختلف مکمل ویتامین D بر مقاومت به انسولین در دوران بارداری

Introduction: The aim  of  this  study  was to impute missing data  and  to compare the effect  of  different doses of  vitamin D supplementation on  insulin resistance during  pregnancy. Methods: A clinical trial  study   was done on 104  women  with diabetes and gestational age less than 12 weeks between 1391 and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of occupational and environmental hygiene

دوره 1 7  شماره 

صفحات  -

تاریخ انتشار 2004